{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ad04e428",
   "metadata": {},
   "source": [
    "![](https://www.nuplan.org/static/media/nuPlan_final.3fde7586.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18ab8c9a",
   "metadata": {},
   "source": [
    "# NuPlan Advanced Model Training Tutorial\n",
    "\n",
    "This notebook will cover the details involved in training a planning model in the NuPlan framework. This notebook is a more detailed deep dive into the NuPlan architecture, and covers the extensibility points that can be used to build customized models in the NuPlan framework.\n",
    "\n",
    "## Table of Contents\n",
    "1. [NuPlan training pipeline architecture](#architecture)\n",
    "2. [Accessing NuPlan data](#data_access)\n",
    "3. [Writing custom preprocessing steps for model training](#custom_preprocessing)\n",
    "4. [Writing custom models](#custom_models)\n",
    "5. [Writing custom loss functions and training metrics](#custom_loss)\n",
    "6. [Configuring the pipeline](#configuring)\n",
    "7. [Running training](#run_training)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d30b55e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tutorials.utils.tutorial_utils import setup_notebook\n",
    "\n",
    "setup_notebook()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1f4a410",
   "metadata": {},
   "source": [
    "# NuPlan Training Pipeline Architecture<a name=\"architecture\"></a>\n",
    "\n",
    "A high level data flow diagram for the training pipeline can be seen below.\n",
    "\n",
    "![](media/nuplan_flow.svg)\n",
    "\n",
    "The training pipeline begins with an [`AbstractScenario`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/scenario_builder/abstract_scenario.py) object, which is an abstraction layer for the underlying dataset containing all of the information present about the state of the world and the vehicle. For example, the `AbstractScenario` will contain information such as the current ego pose, the poses of other vehicles in the world, and the map. More information about the `AbstractScenario` object can be found in the section [Accessing NuPlan Data](#data_access).\n",
    "\n",
    "Scenarios are passed to [`FeatureBuilder`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/feature_builders/abstract_feature_builder.py) and [`TargetBuilder`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/target_builders/abstract_target_builder.py) classes. These two classes are responsible for performing preprocessing steps for building input features to models. For example, a feature builder might [encode the relevant information present in a scenario into a multi-channel raster image](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/feature_builders/raster_feature_builder.py). The purpose of these two classes are covered in much more detail in the section about [writing custom preprocessing steps](#custom_preprocessing).\n",
    "\n",
    "After the scenario has been preprocessed, the features will then be passed to the [`TorchModuleWrapper`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/torch_module_wrapper.py) class for inference. This class contains the meat of the model, and transforms the preprocessed features into trajectory predictions. For example, the wrapper may [take the multi-channel raster image and pass it through a ResNet model](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/models/raster_model.py). Deriving from the `TorchModuleWrapper` class will be covered in more detail in the section about [writing custom models](#custom_models).\n",
    "\n",
    "After the predictions are generated, then the metrics and loss functions are evaluated.  Loss functions are defined by deriving from the [`AbstractObjective`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/objectives/abstract_objective.py) class, while metrics are defined by derifing from the [`AbstractTrainingMetric`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/metrics/abstract_training_metric.py) class. The interfaces between the two classes are similar, but the primary difference is that loss fuctions actively contribute to the backpropagation computation, while metrics are merely logged for review. Deriving from these classes to implement custom loss functions will be covered in more detail in the section about [writing custom loss functions and training metrics](#custom_loss).\n",
    "\n",
    "Finally, the entire training pipeline is configured via [hydra yaml configs](https://hydra.cc/). Utilizing hydra allows for configurations for various submodules to be easily composed and overridden via command line arguments. More information about configuring the training pipeline can be seen in the section about [configuring the training pipeline](#configuring)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10d57986",
   "metadata": {},
   "source": [
    "# Accessing NuPlan Data<a name=\"data_access\"></a>\n",
    "\n",
    "## Distribution of NuPlan DB files. \n",
    "Before using the NuPlan files, they must be downloaded to local disk. For more information about how to download and set up the database, consult the [database_setup](https://github.com/motional/nuplan-devkit/blob/master/docs/dataset_setup.md) documentation.\n",
    "\n",
    "Once downloaded, the following environment variables must be set to point the dataset. Here, is it assumed that the DB files have been downloaded to `/data/sets/nuplan/`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "81312924",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "NUPLAN_DATA_ROOT = os.getenv('NUPLAN_DATA_ROOT', '/data/sets/nuplan')\n",
    "NUPLAN_MAPS_ROOT = os.getenv('NUPLAN_MAPS_ROOT', '/data/sets/nuplan/maps')\n",
    "NUPLAN_DB_FILES = os.getenv('NUPLAN_DB_FILES', '/data/sets/nuplan/nuplan-v1.1/splits/mini')\n",
    "NUPLAN_MAP_VERSION = os.getenv('NUPLAN_MAP_VERSION', 'nuplan-maps-v1.0')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1a18271",
   "metadata": {},
   "source": [
    "## Database Schema\n",
    "NuPlan DB files are [Sqlite databases](https://www.sqlite.org/index.html). This means they can be queried using simple SQL commands. A detailed description of each field and table can be found [in this page](https://github.com/motional/nuplan-devkit/blob/master/docs/nuplan_schema.md). For convenience, the table diagram is replicated below:\n",
    "\n",
    "![](https://github.com/motional/nuplan-devkit/raw/master/docs/nuplan_schema.png)\n",
    "\n",
    "While there is a lot of information present in the diagram, the most important table is the `lidar_pc` table. Rows in this table are used to define a scenario. Auxiliary data (such as ego_pose, agents present in scene, etc), can be found by joining with this table via the primary keys. \n",
    "\n",
    "To visualize the data present in the DB files, there are a few tools that can be used:\n",
    "* **sqlite command line API**: DBs can be opened using `sqlite3 <db_file>.db`. From here, SQL queries can be directly executed against the databases. For example, running `.tables` will list the tables present. [More information about using the sqlite CLI can be seen here](https://www.sqlite.org/cli.html). \n",
    "* **sqlite browser**: This program provdes a GUI info\n",
    "\n",
    "There are 2 APIs that can be used for querying the database. First, we provide access via an ORM via [SqlAlchemy](https://www.sqlalchemy.org/). This is a convenient method for accessing the data, as it precludes any need to write SQL queries directly. However, it can be very slow, so it is only recommended for experiments involving small datasets. The other API directly executes SQL queries against the database files. This method requires writing explicit SQL queries, but can be much faster than the SqlAlchemy backend. Below, we will show how to use both backends to get some information from the database.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3bc4937",
   "metadata": {},
   "source": [
    "### ORM Database API\n",
    "To use the ORM API, the ORM must first be instantiated. This is done by creating a [`NuPlanDBWrapper`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/nuplan_db_orm/nuplandb_wrapper.py) object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "350598f8",
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Expected db load path to be file, dir or list of files/dirs, but got /data/sets/nuplan/nuplan-v1.1/mini",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[3], line 3\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mnuplan\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdatabase\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnuplan_db_orm\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mnuplandb_wrapper\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m NuPlanDBWrapper\n\u001b[0;32m----> 3\u001b[0m nuplandb_wrapper \u001b[38;5;241m=\u001b[39m \u001b[43mNuPlanDBWrapper\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m      4\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdata_root\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mNUPLAN_DATA_ROOT\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m      5\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmap_root\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mNUPLAN_MAPS_ROOT\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m      6\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdb_files\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mNUPLAN_DB_FILES\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m      7\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmap_version\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mNUPLAN_MAP_VERSION\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m      8\u001b[0m \u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/nuplan-devkit/nuplan/database/nuplan_db_orm/nuplandb_wrapper.py:169\u001b[0m, in \u001b[0;36mNuPlanDBWrapper.__init__\u001b[0;34m(self, data_root, map_root, db_files, map_version, max_workers, verbose)\u001b[0m\n\u001b[1;32m    166\u001b[0m logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mLoaded maps DB\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m    168\u001b[0m \u001b[38;5;66;03m# Load nuPlan log DBs\u001b[39;00m\n\u001b[0;32m--> 169\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_log_db_mapping \u001b[38;5;241m=\u001b[39m \u001b[43mload_log_db_mapping\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    170\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdata_root\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_data_root\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    171\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdb_files\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_db_files\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    172\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmaps_db\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_maps_db\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    173\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmax_workers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_max_workers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    174\u001b[0m \u001b[43m    \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_verbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    175\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    176\u001b[0m logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mLoaded \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlog_dbs)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m log DBs\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
      "File \u001b[0;32m~/nuplan-devkit/nuplan/database/nuplan_db_orm/nuplandb_wrapper.py:92\u001b[0m, in \u001b[0;36mload_log_db_mapping\u001b[0;34m(data_root, db_files, maps_db, max_workers, verbose)\u001b[0m\n\u001b[1;32m     90\u001b[0m \u001b[38;5;66;03m# Discover all database filenames to be loaded.\u001b[39;00m\n\u001b[1;32m     91\u001b[0m load_path \u001b[38;5;241m=\u001b[39m data_root \u001b[38;5;28;01mif\u001b[39;00m db_files \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m db_files\n\u001b[0;32m---> 92\u001b[0m db_filenames \u001b[38;5;241m=\u001b[39m \u001b[43mdiscover_log_dbs\u001b[49m\u001b[43m(\u001b[49m\u001b[43mload_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     94\u001b[0m \u001b[38;5;66;03m# Decide number of workers to use for creating the databases.\u001b[39;00m\n\u001b[1;32m     95\u001b[0m num_workers \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mmin\u001b[39m(\u001b[38;5;28mlen\u001b[39m(db_filenames), MAX_DB_LOADING_THREADS)\n",
      "File \u001b[0;32m~/nuplan-devkit/nuplan/database/nuplan_db_orm/nuplandb_wrapper.py:34\u001b[0m, in \u001b[0;36mdiscover_log_dbs\u001b[0;34m(load_path)\u001b[0m\n\u001b[1;32m     32\u001b[0m     db_filenames \u001b[38;5;241m=\u001b[39m [filename \u001b[38;5;28;01mfor\u001b[39;00m filenames \u001b[38;5;129;01min\u001b[39;00m nested_db_filenames \u001b[38;5;28;01mfor\u001b[39;00m filename \u001b[38;5;129;01min\u001b[39;00m filenames]\n\u001b[1;32m     33\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m---> 34\u001b[0m     db_filenames \u001b[38;5;241m=\u001b[39m \u001b[43mget_db_filenames_from_load_path\u001b[49m\u001b[43m(\u001b[49m\u001b[43mload_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     36\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m db_filenames\n",
      "File \u001b[0;32m~/nuplan-devkit/nuplan/database/nuplan_db_orm/nuplandb_wrapper.py:64\u001b[0m, in \u001b[0;36mget_db_filenames_from_load_path\u001b[0;34m(load_path)\u001b[0m\n\u001b[1;32m     60\u001b[0m         db_filenames \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m     61\u001b[0m             \u001b[38;5;28mstr\u001b[39m(path) \u001b[38;5;28;01mfor\u001b[39;00m path \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28msorted\u001b[39m(Path(load_path)\u001b[38;5;241m.\u001b[39mexpanduser()\u001b[38;5;241m.\u001b[39miterdir()) \u001b[38;5;28;01mif\u001b[39;00m path\u001b[38;5;241m.\u001b[39msuffix \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.db\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[1;32m     62\u001b[0m         ]\n\u001b[1;32m     63\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m---> 64\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mExpected db load path to be file, dir or list of files/dirs, but got \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mload_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m     66\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m db_filenames\n",
      "\u001b[0;31mValueError\u001b[0m: Expected db load path to be file, dir or list of files/dirs, but got /data/sets/nuplan/nuplan-v1.1/mini"
     ]
    }
   ],
   "source": [
    "from nuplan.database.nuplan_db_orm.nuplandb_wrapper import NuPlanDBWrapper\n",
    "\n",
    "nuplandb_wrapper = NuPlanDBWrapper(\n",
    "    data_root=NUPLAN_DATA_ROOT,\n",
    "    map_root=NUPLAN_MAPS_ROOT,\n",
    "    db_files=NUPLAN_DB_FILES,\n",
    "    map_version=NUPLAN_MAP_VERSION,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d967b953",
   "metadata": {},
   "source": [
    "This wrapper serves as an abstraction over the list of `log_dbs` that are present on the machine. A log_db can be extracted from the wrapper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8719bb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "log_db_name = \"2021.05.12.22.00.38_veh-35_01008_01518\"\n",
    "log_db = nuplandb_wrapper.get_log_db(log_db_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a9c3929",
   "metadata": {},
   "source": [
    "This return a [`NuplanDB`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/nuplan_db_orm/nuplandb.py) object, which can then be queried."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf68e643",
   "metadata": {},
   "outputs": [],
   "source": [
    "lidar_pcs = log_db.lidar_pc\n",
    "print(f\"The number of lidar_pcs in this log file is {len(lidar_pcs)}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2ab1419",
   "metadata": {},
   "source": [
    "Foriegn Key links appear as property members for the returned model objects. These are fetched lazily on demand from the database. For example, to retrieve the state of the ego vehicle corresponding to the beginning of the scenario, one can write:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17c4b4cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_lidar_pc = lidar_pcs[0]\n",
    "example_ego_pose = example_lidar_pc.ego_pose\n",
    "print(f\"Lidar_pc token: {example_lidar_pc.token}.\")\n",
    "print(f\"Ego pose: <{example_ego_pose.x}, {example_ego_pose.y}, {example_ego_pose.z}>.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b54cb7bd",
   "metadata": {},
   "source": [
    "This method is convenient, but can be very slow if large amounts of data are pulled from the database. In this case, it is recommended to use the SQLAlchemy query builder. For example, consider the task of finding the first 10 lidar_pc tokens with a timestamp greater than 1620857890400393. Looping over the `lidar_pc` table is inefficient compared to using the query builder, like below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d66e0fd8",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from nuplan.database.nuplan_db_orm.lidar_pc import LidarPc\n",
    "\n",
    "lidar_pc_objs = log_db.session.query(LidarPc) \\\n",
    "  .filter(LidarPc.timestamp > 1620857890400393) \\\n",
    "  .order_by(LidarPc.timestamp) \\\n",
    "  .limit(10) \\\n",
    "  .all()\n",
    "    \n",
    "print(f\"Number of lidar_pcs returned: {len(lidar_pc_objs)}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aed4b3c7",
   "metadata": {},
   "source": [
    "Two more performance notes about the SqlAlchemy backend:\n",
    "* Although the above example uses `.all()` to coalesce the results to a list, this should be avoided whenever possible for optimal performance. Prefer to use generators and `for x in query()` idioms whenever possible.\n",
    "* In order to avoid memory leaks, ensure that all classes that maintain a long-standing copy of the database object call `add_ref()`, and call `remove_ref()` when they are done using the object. See the comments in the [db.py source code](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/common/db.py#L337) for more information about these functions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ab03e76",
   "metadata": {},
   "source": [
    "### Direct SQL API\n",
    "To use the direct SQL API, first import the helper functions. There are two functions provided: [`execute_many`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/nuplan_db/query_session.py#L5), and [`execute_one`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/nuplan_db/query_session.py#L30). The difference between them is that `execute_one` ensures that at most 1 row will be returned from the query, and can be useful when expecting a single value. `execute_many` has no restrictions on the number of rows that can be returned.\n",
    "\n",
    "Then, write a sql query, passing the log file as a parameter. For example, here is a translastion of the query to get the number of tokens in a database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a4e3873",
   "metadata": {},
   "outputs": [],
   "source": [
    "from nuplan.database.nuplan_db.query_session import execute_one, execute_many\n",
    "\n",
    "query = \"\"\"\n",
    "SELECT COUNT(*) AS cnt\n",
    "FROM lidar_pc;\n",
    "\"\"\"\n",
    "\n",
    "result = execute_one(query, (), os.path.join(NUPLAN_DB_FILES, f\"{log_db_name}.db\"))\n",
    "print(f\"The number of lidar_pcs in this log files is {result['cnt']}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0ce166b",
   "metadata": {},
   "source": [
    "All functionality exposed within SQLite can be used when writing these queries. Queries can be parameterized with the second argument to the API calls. For example, here's an example of getting the ego pose corresponding to a particular lidar_pc token:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d568cb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_token = \"e1e4ee25d1ff58f2\"\n",
    "query = \"\"\"\n",
    "SELECT ep.x AS ep_x,\n",
    "       ep.y AS ep_y,\n",
    "       ep.z AS ep_z,\n",
    "       lp.token AS token\n",
    "FROM ego_pose AS ep\n",
    "INNER JOIN lidar_pc AS lp\n",
    "  ON lp.ego_pose_token = ep.token\n",
    "WHERE lp.token = ?\n",
    "\"\"\"\n",
    "\n",
    "result = execute_one(\n",
    "    query, \n",
    "    (bytearray.fromhex(example_token),), \n",
    "    os.path.join(NUPLAN_DB_FILES, f\"{log_db_name}.db\")\n",
    ")\n",
    "\n",
    "print(f\"Lidar_pc token: {result['token'].hex()}.\")\n",
    "print(f\"Ego pose: <{result['ep_x']}, {result['ep_y']}, {result['ep_z']}>.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8280d649",
   "metadata": {},
   "source": [
    "A few notes about using the direct SQL API:\n",
    "* Most of the commonly-used queries are already available in [`nuplan_scenario`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/scenario_builder/nuplan_db/nuplan_scenario.py). The SQL queries used for this abstraction can be seen in [`nuplan_scenario_queries.py`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/database/nuplan_db/nuplan_scenario_queries.py). This is also a good place to look for more examples of complex SQL queries that can be executed against the database.\n",
    "* In the above examples, the log_db file path needed to be constructed manually, but when used with an existing [`abstract_scenario`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/scenario_builder/abstract_scenario.py) object, this can be obtained by accessing the propery [`log_name`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/scenario_builder/abstract_scenario.py#L30).\n",
    "* Slow queries can be debugged using [`EXPLAIN QUERY PLAN`](https://www.sqlite.org/eqp.html) in the sqlite3 CLI. This can ensure that the provided indexes are being used properly."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d3d0b58",
   "metadata": {},
   "source": [
    "# Writing Custom Preprocessing Steps<a name=\"custom_preprocessing\"></a>\n",
    "In NuPlan, custom preprocessing steps are encapsulated in `FeatureBuilders` and `TargetBuilders`. The purpose of these classes are to encapsulate the extraction of model-specific features from the model-agnostic [`AbstractScenario`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/scenario_builder/abstract_scenario.py) representation of the world.\n",
    "\n",
    "`FeatureBuilders` and `TargetBuilders` are developed by deriving from [`AbstractFeatureBuilder`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/feature_builders/abstract_feature_builder.py) and [`AbstractTargetBuilder`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/target_builders/abstract_target_builder.py), respectively. The difference between these two classes is that the `AbstractFeatureBuilder` builds features that are the inputs to the model, while `AbstractTargetBuilder` builds the outputs that a model will try to predict. For example, a feature builder may [create an encoding of the locations of the ego agent into a custom tensor](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/feature_builders/raster_feature_builder.py), while a target builder might [encode the future trajectory of the ego into a tensor](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/target_builders/ego_trajectory_target_builder.py). During the generation of the features, the feature data is typically represented as [numpy arrays](https://numpy.org/), and exists on the host CPU device. \n",
    "\n",
    "The primary output from these builders `AbstractModelFeature`, which derives from the [`AbstractModelFeature`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/preprocessing/features/abstract_model_feature.py) class. These classes are typically thin wrappers around `torch.Tensor` that contain convenience methods to allow for more readable member access as well as hooks to allow for custom collation and how to convert the numpy arrays to `torch.Tensors`.\n",
    "\n",
    "One important assumption made about NuPlan feature and target builders is that the output is deterministic. This allows features to be cached and re-used rather than repeatedly computed during the training loop. For non-deterministic transformations such as adding gaussian noise, consider writing an `DataAugmentor`. This class abstracts a feature transformation that is non-cachable. To write one of these, derive from [`AbstractAugmentor`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/data_augmentation/abstract_data_augmentation.py) and implement the provided interface. For example, a transformation that adds a small amount of gaussian noise to every agent can be seen in [simple_agent_augmentor.py](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/data_augmentation/simple_agent_augmentation.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7382879d",
   "metadata": {},
   "source": [
    "# Writing Custom Models<a name=\"custom_models\"></a>\n",
    "In NuPlan, planners are represented as a transformation from features to targets. Typically, this will involve running some sort of ML model on the input feature tensors to derive an output trajectory, but the framework allows for any sort of transformation.\n",
    "\n",
    "To write a custom model, derive from [`TorchModuleWrapper`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/torch_module_wrapper.py). For example, a custom model might accept a cusom encoding of the locations of objects of interest, and [run resnet to create a trajectory prediction](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/models/raster_model.py). The input tensors are typically on the target training device (e.g. the GPU) and represented as `torch.Tensors` when they reach the input of this class. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7c92d34",
   "metadata": {},
   "source": [
    "# Writing Custom Loss Functions and Training Metrics<a name=\"custom_loss\"></a>\n",
    "In NuPlan, both loss functions and training metrics encapsulate functions that compute a scalar value given model predictions and ground truth predictions. The difference between these two functions is that output from loss functions are actively used during the backpropagation computation, while the metrics are merely logged to tensorboard. Custom loss functions can be implemented by deriving from [`AbstractObjective`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/objectives/abstract_objective.py), and metrics can be implemented by deriving from [`AbstractTrainingMetric`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/metrics/abstract_training_metric.py)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "265cdf0f",
   "metadata": {},
   "source": [
    "# Configuring the Training Pipeline<a name=\"configuring\"></a>\n",
    "NuPlan uses [`hydra`](https://hydra.cc/) as a configuration management engine. Before beginning to write custom pipeline configurations, it may be helpful to read through the [getting started guide](https://hydra.cc/docs/intro/) to understand the basics of hydra configuration management.\n",
    "\n",
    "Most training pipelines begin by applying [`default_common.yaml`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/script/config/common/default_common.yaml) and [`default_experiment.yaml`](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/script/config/training/default_training.yaml). This contains information such as the optimizer to use, where data caches are stored, and the parallelization engine to use. These can be overridden in the experiment config files as well as the custom configuration files. \n",
    "\n",
    "Typically, for each custom class that is written, a similar config file will be written at the same place containing all of the parameters that are injected into the constructor. For example, for pipelines using the [raster model](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/training/modeling/models/raster_model.py), the corresponding [raster_model.yaml](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/script/config/common/model/raster_model.yaml) file will be sourced. This contains information about which class to construct at runtime (via the `_target_` parameter), as well as the parameters to inject into the constructor. Once written, this yaml file can be used in the training pipeline by overriding the default value, either on the command line or via an [override configuration yaml](https://github.com/motional/nuplan-devkit/blob/master/nuplan/planning/script/experiments/training/training_raster_model.yaml). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "172a3f45",
   "metadata": {},
   "source": [
    "# Running Training<a name=\"run_training\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bcdc998",
   "metadata": {},
   "source": [
    "Once all of the classes above have been implemented and the configuration files created, training is as simple as running this command from an activated `nuenv` conda environment:\n",
    "\n",
    "```\n",
    "python planning/script/run_training.py +training=the_new_yaml_file\n",
    "```\n",
    "\n",
    "Hydra args can be overridden using the usual hydra syntax. For example, to disable parallelization and run everything single-threaded (e.g. for debugging), the following command can be used\n",
    "\n",
    "```\n",
    "python planning/script/run_training.py +training=the_new_yaml_file worker=sequential\n",
    "```\n",
    "\n",
    "Some common pitfalls:\n",
    "* If preprocessing takes a large amount of time, it can cause training to fail (especially in a distributed setting). It may be beneficial to create the feature cache by first running the caching with the argument `py_func=cache cache.force_feature_computation=True`. This will generate the features using CPU, which should speed up training dramatically. Once caching is complete, supply the overrides `cache.force_feature_computation=False cache.cache_path=/path/to/cache cache.use_cache_without_dataset=True` to avoid re-computing the features.\n",
    "* Tokens are arbitrary hex strings, which could happen to be all numbers. Occasionally, this can cause hydra to misinterpret the strings as numbers. This can be fixed by explicitly quoting all tokens when used in parameters, e.g. `scenario_filter.scenario_tokens='[\"48681125850853e4\", \"48681125851853e4\"]'`\n",
    "* The parallel engine defaults to `ray_distributed`, which is the most flexible but can sometimes cause segfaults when running on extremely large datasets. For these scenarios try using the overrides `worker=single_machine_thread_pool worker.use_process_pool=True`\n",
    "\n",
    "The training output directory will have the materialized hydra config, as well as the trained model checkpoint on the conclusion of training. This can then be used [in the simulation pipeline](https://github.com/motional/nuplan-devkit/blob/master/tutorials/nuplan_planner_tutorial.ipynb) to evaluate the performance of the model. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "nuplan",
   "language": "python",
   "name": "nuplan"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
